Goto

Collaborating Authors

 transfer regret


Meta Learning in Bandits within Shared Affine Subspaces

arXiv.org Machine Learning

In the applications mentioned above, the tasks often relate to each other despite being different. For instance, subgroups of patients have comparable features. As another We study the problem of meta-learning several example, holidays or discount periods promote similar interests contextual stochastic bandits tasks by leveraging in the products of an e-commerce website. That observation their concentration around a low-dimensional motivates us to look beyond a single task to uncover affine subspace, which we learn via online principal a relation between different ones to accelerate learning component analysis to reduce the expected on newly encountered tasks. That problem, referred regret over the encountered bandits. We propose to as meta-learning or learning-to-learn (LTL), has mainly and theoretically analyze two strategies that solve appeared in the offline learning literature so far (Hutter the problem: One based on the principle of optimism et al., 2019). Nevertheless, an emergent body of literature in the face of uncertainty and the other via combines LTL and MAB to accelerate learning and reduce Thompson sampling. Our framework is generic the average regret per task (Cella et al., 2020; Cella and and includes previously proposed approaches as Pontil, 2021; Bilaj et al., 2023).


Meta Learning MDPs with Linear Transition Models

arXiv.org Artificial Intelligence

We study meta-learning in Markov Decision Processes (MDP) with linear transition models in the undiscounted episodic setting. Under a task sharedness metric based on model proximity we study task families characterized by a distribution over models specified by a bias term and a variance component. We then propose BUC-MatrixRL, a version of the UC-Matrix RL algorithm, and show it can meaningfully leverage a set of sampled training tasks to quickly solve a test task sampled from the same task distribution by learning an estimator of the bias parameter of the task distribution. The analysis leverages and extends results in the learning to learn linear regression and linear bandit setting to the more general case of MDP's with linear transition models. We prove that compared to learning the tasks in isolation, BUC-Matrix RL provides significant improvements in the transfer regret for high bias low variance task distributions.


Meta-learning with Stochastic Linear Bandits

arXiv.org Machine Learning

We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.